Text Clustring with Fuzzy Measure of Descriptors Weight
نویسندگان
چکیده
Our work consists in implementing a new two-dimensional descriptor in Text Mining. After the morphosyntaxic analysis of the words using the techniques of automatic treatment of the natural language, there is lost additional information which we will not neglect but rather put in a new dimension. This involves a rewriting of weight descriptors in documents by a new "fuzzy" measure. The application of this approach on an Arabic corpus involved a transformation of text words in a set of pairs (root, pattern) to be descriptors of our corpus. The morphosyntactic analysis gives all possibilities and not a single solution. We apply, then the Hidden Markov model morphosyntaxic post-analysis to detect the most likely based on the context of the word analysis. We show that we are able to achieve higher precision when compared to conventional Vector Space Model representation and Latent Semantic analysis in the context of Arabic Text Clustering. Keywords— Text Mining, Text Processing , Hidden Markov models , Natural language Processing, Fuzzy logic
منابع مشابه
خوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملContent-based Dynamic Email Spam Detecting Using Fuzzy Granular Computing Approach
Spam detection is a significant problem which is considered by many researchers by various developed strategies. The best and main spam detection technique should consider and scan the content of the messages to find spam. This research concerns the development of the certain category of granular computing as a classifier for spam detection. In this research, Fuzzy Granular Computing Classifica...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملDESIGN AND IMPLEMENTATION OF FUZZY EXPERT SYSTEM FOR REAL ESTATE RECOMMENDATION
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; backgro...
متن کاملInformation extraction and imprecise query answering from web documents
Word based searches for relevant information from texts retrieve a huge collection and burden the user with information overload. Ontology based text information retrieval can perform concept-based search and extract only relevant portions of text containing concepts that are present in the query or those that are semantically linked to query concepts. While these systems have better precision ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014